% this file is called up by thesis.tex
% content in this file will be fed into the main document
\ifpdf
\graphicspath{{5/figures/PNG/}{5/figures/PDF/}{5/figures/}}
\else
    \graphicspath{{5/figures/EPS/}{5/figures/}}
\fi

\setcounter{chapter}{4}
% CHAPTER 5
\chapter{From Macro to Micro: Four factors influencing email reciprocity}\label{chap:Chp5} % top level followed by section, subsection

\begin{quotes}
Rigid, the skeleton of habit alone upholds the human frame \\
\attrib{{Virginia} \citet{woolf2012}   }
\end{quotes}

\begin{quotes}[Structures are] `machines for the the suppression of time.'\\
\attrib{My paraphrase on {Claude L\'{e}vi-Strauss} \citep{giddens1979}   }
\end{quotes}

%\begin{quotes}
%Not men and their moments but moments and their men \\
%\attrib{{Erwin} \citet{goffman1967}   }
%\end{quotes}


\begin{quotes}
The rumours then spread, like the spillway plunging down the street to the jetty, fanned out, here and there moving more swiftly and forming new branches, elsewhere coming to a standstill and drying up \ldots And so the rumours were transformed, further embroidered upon or attenuated, sometimes even refuted. Yet they persisted as the cocoon for a single statement, concealing its larva within, and no one knew what might yet come creeping out. The statement was: Naso [Ovid] is dead. \\
\attrib{ \citep{ransmayr1996}   }
\end{quotes}


\section{Introduction}
This second empirical chapter explores a simple question: what is the likelihood of receiving a reply to a given email? Investigating  this question continues the line of empirical inquiry begun in the previous chapter, but with noticeable differences:
\begin{enumerate}

\item \textit{\textbf{Explicit links between transactions.}} The theoretical discussion about social transactions (see section~\ref{sec:Chp2Transactions}) highlighted one of their defining features, namely the notion that they do not merely reflect or reinforce links between people, but that the transactions themselves are linked to one another. In an exchange of emails, for example, senders and recipients are associated with one another, but so are the messages, each message a possible response to a previous one and a potential stimulus to the next in a chain of related messages. This interdependency between separate social transactions is part of what makes them `social.'\footnote{Recall the argument from the theoretical chapter (Chapter 2) about the difference between a social transaction and what we might call a non-social `behaviour.' The example given in this context was the difference between a `wink' and a `blink,' the former regarded as `social' because it was done with `reason and purpose,' an invitation for a subsequent and related social action. What makes an action `social' we concluded, is its potential to be causally linked to subsequent and/or prior social actions.} In the previous chapter these inter-transactional links are inferred implicitly, as a way to explain the empirical findings. To exaplain  why private emails contribute more reciprocity and less transitivity than broadcast emails, an explanatory mechanism was suggested according to which stimulus-response links depend on whether the stimulus was a private or a broadcast message. But this was an inferred explanation, not directly observed in the dataset. In contrast, by asking what elicits a reply to an email, this chapter seeks direct observation and explicit measurement of stimulus-response links. 
\item \textit{\textbf{Emphasis on macro-to-micro mechanisms.}}  The previous chapter seeks to explain structural topology at the level of the network by recourse to the way people responded to incoming emails. The emphasis there is on micro-to-macro type mechanisms. In contrast, by asking what contributes to the likelihood of eliciting a reply, the current chapter takes the structure of the network, the individuals, ties and even the stimuli transactions as exogenously given, and seeks to trace how a recipient makes the decision about the (best) course of action to take, whether to reply or not. 
\item \textit{\textbf{The nature of ties.}} 
This chapter continues the previous one on yet another level, holding that communication transactions observed in the empirical data are linked to social ties and their properties. For example, the previous chapter made an argument that private emails were associated with stronger ties, whereas broadcast emails with weaker ones. This (imputed!) association between tie strength and email type explains both higher levels of reciprocity in private emails and the narrower out-degree distribution of the networks from which they are constructed. This chapter examines the properties of ties that contribute to the rate of reply to a given email. In both of the empirical chapters, properties of the tie is a higher level of analysis serving to explain a phenomenon under observation. 

%For example, recall from the degree distribution analysis in the previous chapter that private emails tend to connect individuals with a relatively small sub-group of their direct contacts, whereas broadcast emails typically involve a much larger range of contacts. This raises the possibility that when sending and receiving broadcast-emails, people engage in weaker, more superficial social ties than when sending private mails. The tie-strength assumption was also useful in explaining why private messages are reciprocated more frequently than broadcast ones  \citep{granovetter1973}. Thus micro-level transaction activity was linked to a meso-level tie property, namely the strength of ties. As we shall see, this chapter continues to explore the link between the micro-level and the meso-level of aggregation. 
\end{enumerate}

\noindent The rest of the Chapter is organized in the following way. The next section describes how chains of stimulus-response type messages are operationalized in this email dataset. This is followed by a description of the multilevel model and the way the data was sampled, as well as the results. The chapter ends with a summary and some reflections.

\section{Operationalizing stimulus-response chains}\label{sec:Chp5Operationalizing}
  
The point of departure is the notion that email communication are chains of interrelated social transactions. Just like conversational exchanges \citep{gibson2000,gibson2005} in which people take turns producing speech utterances in a sequence, each email serves as an invitation or stimulus for the next email in the chain, possibly setting in motion a series of related emails that bounce back and forth between actors, at times involving more people in the process, at times leaving some actors out. What we have here is something akin to a process of diffusion, but slightly more general. Diffusion processes typically change the properties of individuals, for example when a disease of some sort spreads, people become carriers of a certain virus. But in this case we are talking more generally about a process that involves collaboration, discussion, advice or any type of social intercourse that transcends the level of a single message and extends over a certain length of time. While exchanging ideas, participants might even move from one topic of discussion to another. There is no requirement for anything tangible to actually spread in the network, only for a sequence of stimulus-response between transactions, like the process of falling dominos, ongoing, until it stops after a certain period of time \citep{barabasi2005}. This chapter probes the mechanisms that underlie this process and models the conditions that facilitate its unfolding. 



%discussed in the theoretical chapter (Chapter 2), the difference between a social (trans)action and a non-social behaviour, the distinction between a wink and a blink \citep[p 15]{oakeshott1991}, is that the former is done with intention and purpose, and is therefore functionally linked to another transaction. 
 
%Moreover, it qualifies the very meaning of some of the concepts we use. Consider the term reciprocity, for example. At the meso-level of ties, this term holds when something (information, messages, money etc) flows in both directions. But we do not necessarily need to know whether there is any (causal) connection between the flow in one direction and the other. In the micro-level of transaction, reciprocity holds only when one message 


Empirically, the basic building block for such a process - the `micro-case' if you will, is the link between an email `stimulus' and a subsequent email `response.' Note here the difference between these two terms; \textit{response} and  \textit{reply} \cite{goffman1976}, the latter being a more general case of the former. Upon receiving an email stimulus, a recipients' response could be an email reply to the its sender. But it could be somethig completely different. In its most general form a response could be one of the following; (1) hitting the `reply' control button dispatches a reply exclusively to the sender of the stimulus mail, (2) hitting the `reply all' control button, dispatches a message to the sender of the original message and to all  co-recipients, (3) hitting the `forward' control button dispatches an email to any third party, and (4) some completely different kind of response that has nothing to do with emails, perhaps. Let us discuss the first three types of responses in greater detail. 

\begin{enumerate}
\item The first type of response contributes to reciprocity in the network without changing the level of transitivity (when measuring transitivity, the direction is ignored in which the message is sent.)

\item A reply-all response contributes to the reciprocity in the network, too, but it also contributes to network transitivity. Consider a group of $n$ members. One of the members sends an email to all the other   $n-1$ members of the group, creating a star like network with zero transitivity and zero reciprocity. Now, one of the recipients of this first email hits  `reply-all,' effectively creating $n-1$ copies of her reply, one landing in the inbox of the sender of the original email, and one landing in each of the other $n-2$ members of the group. This transaction contributes a single reciprocated tie (the one connecting her with the sender of the first email) and an additional $n-2$ closed triangles to the network (all triangles sharing the same base, it being the tie connecting the sender of the first email with the sender of the second.) When a third group member hits 'reply-all,' she contributes two reciprocated ties (the ties connecting her to the first and to the second senders.) In addition,  $2\left( n-2\right) $ new triangles appear. If all recipients hit `reply-all,' the graph becomes complete - each and every dyad has exchanged messages in both directions, both the transitivity and reciprocity of the networks have reached the maximum value of $1$.

\item Besides the `reply' and the `reply-all' responses, the `forward' response may or may not contribute to  the network's reciprocity and to its transitivity, depending on other, possibly unrelated transactions. Moreover, there are of course hybrid transactions; one might hit `reply-all' and then remove some of the recipients or even add some that were not in the original list.


\end{enumerate}


Of these different responses to an email stimulus, this chapter focuses on the first two. A technical problem is how to establish that one email is a reply to the other. This problem is addressed by searching for common subject fields: if two messages share the same subject field, the first sent from $A$ to $B$ and the second sent at a later time back from $B$ to $A$, these two emails are defined as related.\footnote{Two types of concerns might be raised here. A \textit{false-negative} occurs when someone replies to an email, completely changing its subject title. This would most probably be considered a reply, but it would not be identified as one since the messages do not share a common subject field. Perhaps a more serious issue is the \textit{false-positive,} when people exchange completely unrelated emails that share the same subject field; either because they initiated a new email by hitting the reply control button associated with an old, completely unrelated email, without bothering to change the subject field. Another possibility is that people use some generic subjects (such as `stuff', or `hi'.) To control for this issue, the subject lines were read and in case of doubt the removal of unrelated emails was carried out. However, we should consider the possibility that both concerns might have an effect on the results.} Thus, three conditions must be met (1) stimulus and response  emails must have the same subject line (ignoring prefixes such as `Re') (2) the identity of the sender and the recipients in the stimulus and response should appear in reverse, and (possibly) (3) stimulus and response emails should not be separated by a long time interval \citep{gibson2005}.



Now, whether or not a `stimulus' is effective in eliciting such a response depends on various factors. Four factors are considered; (1) properties of the email's sender (sender effect), (2) properties of its recipients (recipient effect), (3) properties unique to each sender-recipient dyad (dyad effect) and (4) properties of the email that may or may not trigger a reply (stimulus effect). Some of these factors are known from studies of the dynamics of speech exchanges and small group research \citep[cf: ][]{gibson2000, gibson2005, goffman1976}. In what follows, we focus on email communication and seek to isolate and compare the marginal effects of these different factors. 

The sender and recipient effects refer to the specific properties of the senders and recipients of the original stimulus email. Certain individuals may have a high {\it sender effect} if most of their emails tend to be highly effective in eliciting replies. A high sender effect can simply be explained by the organizational position of a sender. For example, a sender high in the organizational hierarchy  might be too important to ignore, so recipients tend to reply to their emails more frequently than to emails sent by others. By the same token, a high {\it recipient effect} is an attribute of recipients who are generally more responsive than other actors in the network. This could happen, for example, if an actor occupies a role that demands her to be highly responsive to incoming emails. 

After controlling for the effects of the individuals, most interesting is the {\it dyadic effect}, allowing for variations between different dyads that cannot be attributed to single actors. Two actors might have average sender and recipient effects, but when they send each other emails, they may be much more likely to hit reciprocate than otherwise. The dyadic effect allows for certain pairs of actors to be more (or less) responsive to each other's emails relative to their mean responsiveness. This could be due to unique features of their relationship such as its history. It could also be due to certain issues that bind them to each other (or separate them from one another,) something special in the `social tie' connecting the two individuals, that make them more likely to act in a certain way to one another, differently than what one might expect otherwise.



Finally, the {\it stimulus effect} allows for specific features of the message itself to stimulate replies to a degree that cannot be reduced to the actors or dyads. A high stimulus effect sets an email apart from other emails sent by the same sender to each of the other recipients: it may reflect the email's unique content, its timing, or a specific signal indicating whether or not the sender is awaiting a reply. Whatever the reasons, the method proposed here allows to tease out and identify highly effective (or ineffective) actors, ties and emails in a systematic manner.


Whereas sender and recipient effects may depend, in part, on the absolute positions of actors in an organization, dyadic effects account for their relative positions. Finally, stimulus effects account for the idiosyncratic properties of specific transactions. The stimulus effect is an attempt to identify transactions with consequences that cannot be explained merely by the average behaviour of the actors involved or their relationships\footnote{Allowing for a consideration of Goffman's famous dictum \citep{goffman1967}: `Not men and their moments, rather moments and their men'}.

Estimating the variance that is associated with each of these four factors allows us to judge the relative salience of these factors. To this end, multilevel models are commonly used (see section~\ref{sec:Chp3Multilevel}): a statistical method that partitions variance when the data is structured hierarchically. Such a structure is characterizes the sender, who may send multiple emails, the recipient, who may receive multiple emails, the sender-recipient dyad, which may be a conduit for multiple emails, and the email, which may be associated with multiple replies from each of its recipients. If we find one factor to be more important than the others, that is - if that factor is associated with a larger variance, we may suggest that this factor is critical for the context of the unfolding of email exchange in this organization.

The following sections give a brief overview of the use of multilevel data models, with an emphasis on cross classification and multiple roles. A descripiton of the sampling of the dataset (section~\ref{sec:Chp3Enron}) follows together with the results of the fitted model. 


\section{Modeling replies using multilevel analysis}

The method of multilevel analysis described in section~\ref{sec:Chp3Multilevel} refers to a family of regression estimation methods that trace the different sources of variability in the data. It is commonly applied to cases (the micro-level) that are nested within one or more categories (the macro-level.) The      objective is to separate the variability observed  in the data to the variability explained by each type of category. 

%The use of multilevel analysis has become an established practice in study of social networks \citep{snijders1999, snijders2003, Lazega2008, Duijn1999}. In a model closely related to the one presented here \citep{snijders1999}, a crossed factor analysis is used to model a continuous outcome variable, a proxy for the strength of a tie. 
Simple uses of multilevel models consist of micro-level cases that are nested within macro-level classes. The classical example is that of children's achievements in school. The variability between children could be large when taken as a group. But separating them into sub-groups allows one to tease out what part of the variability between students is due to variations between schools, and what part is due to differences between students within the schools. 



Crossed classified models \citep[Chapter 12]{goldstein2011} are used when micro-cases are embedded not only into one but into two types of categories. Consider for example students' achievements in their finals. Suppose that the final examinations are standardized country-wide, each year in all schools throughout the country, students present the exact same battery of exams. Now, as before, the variability could be separated according to schools, the variability between students' achievements within the school may be smaller than the total variability of students' achievements. 

\figuremacroWH{ImgAchievementInSchoolYear}{Crossed classified model: student's achievement nested in years and schools}{A student's achievement in nation-wide, yearly standardized exams constitutes a micro-level case, each such case nested within a specific year in which the exam was taken, and within the school in which the student was studying.}{.40}

In addition to the different properties of schools, the exams vary from year to year. Hence, variability in student performance could also be attributed to the years in which an exam was taken, the variability between students' achievements within each year smaller than the total variability. Each school contains a large group of micro-cases and each year contains a large group of micro-cases, but the years and the schools are not nested within one another, the relationship between years and schools is not like the relationship between students and schools, the latter being a one-to-many relationship and the former being many-to-many. Thus, crossed classified multilevel models comprise of cases that are nested within cross-classifications of two or more differing hierarchies. Figure \ref{ImgAchievementInSchoolYear} depicts the cross-level hierarchy. 


Just like a student's achievements are partly a product of properties of the school in which she studied and the year in which she took it, multilevel analysis of networks \citep{snijders1999, snijders2003, Lazega2008, Duijn1999} treat tie level attributes as if they were partly the product of the two actors with which the tie is associated, as depicted in figure \ref{ImgTieEmbeddedInActorsSneijdersAndKenny}. The hypothesis here is that all ties incident to one actor have something in common, their properties influenced by the actor and the variability between them smaller than the overall variability of all the ties.  

There are, however, two crucial differences between these two examples, students nested in years and schools, and ties nested within pairs of actors. One difference is that within each $\text{school}\times\text{year}$ combination there are multiple students (micro-cases,) whereas in networks every pair of connected actors define only one tie. That, however, does not present any technical difficulties. The more serious problem is that schools and years do not refer to the same entities, whereas a sender of a tie and its receiver could refers to the exact same entity. This requires the use of  multiple roles models \citep[p 161-162]{snijders2011}, to allow for a possible correlation between two attributes of the same actor, once acting in the role of a receiver and once acting in the role of a sender. 

\figuremacroW{ImgTieEmbeddedInActorsSneijdersAndKenny}{Crossed classified model: tie's nested within individuals as in \citep{snijders1999}}{The tie's strength constitutes a micro-level case, each case nested within the group of one actor's ties and the group of the second actor's ties.}{.40}

The multilevel model is constructed such, that each actor can have multiple ties, the tie variable is treated as a micro-level case, nested within the set of ties belonging to each actor. Actors are therefore treated as the macro-level entities. We now want to adapt this model into our context, including not only ties between actors but also multiple transactions associated with every tie. %which is well known in the literature of statistics of social networks \citep{Duijn1999, Duijn2004, Zijlstra2005, Zijlstra2006, snijders1999}. The relevance of the model to the problem at hand stems from its multilevel approach: the outcome variable is defined at the level of the dyad, for example, it could be a binary variable denoting whether a tie exists or not \citep{Zijlstra2005} or it could be a continuous variable denoting the strength of a tie \citep{snijders1999}. The $p2$ model regresses the outcome on random and fixed effects at the level of the nodes. Since multiple dyads can be associated with each node, the dyad level is nested within the level of the node. 
%To include social transactions in the model, consider the following adaptation of the basic $p2$ model described above. Instead of having an outcome variable at the dyad level, the outcome variable is at the level of the single sender-recipient transaction \citep{denooy2011}. Since multiple transactions may be exchanged within each dyad, transactions can be seen as nested within the dyad and within its actors. 


The outcome is a binary variable, namely whether or not a stimulus email has prompted a reply from each of its recipients. Since the time-window of observation is constrained, there is a danger that a stimulus sent close to the end of the time-window will elicit a reply that will not be observed. To fix this problem, two time-windows have been defined, both beginning at $00:00$ hours on September 1, 2001 but the first ending two weeks before the second, the second ending on $23:59$ hours on December 31st, 2001. Stimuli were limited to all emails sent within the first time-window, and replies were identified in the second time-window. Thus, even if a stimulus was sent at the very last moment of the first time-window, its reply could be identified, provided that it was sent within two weeks after the stimulus was sent. 


In the first and simplest model, this outcome is modelled against the sender and recipient random effects. Thus for each actor an  estimate is made of a `global' sender effect and a `global' recipient effect. Similar to the study described above \citep{snijders1999}, each transaction is associated with two macro-level groups: the group of email transactions sent by the sender and the group of email transactions addressed to the recipient. Each of these groups could have some common features, such that the variability within the group is smaller than the total variability in reply rate. In addition, this model estimates the correlation effect that is necessary for multiple role models.

The second model adds an estimation of a dyad effect to test how the decision to reply varies not only between actors, but also between an actor's different relationships. In this model we need to drop the correlation between the sender and recipient effects, otherwise the model is overspecified. The third and last model adds an estimation of the effect of each particular incoming email stimulus along with fixed indicators unique to that stimulus. 

\figuremacroW{ImgMailCopyEmbeddedModel}{Crossed classified model, adapted from \citep{snijders1999} to take into account Sender, Recipient, Dyad and Message.}{Covariates include, at the level of the }{.85}



\subsection{The Data Set}
The email communication data set used in the current study consists of a snapshot taken from a well documented version of the Enron email corpus (see section \ref{sec:Chp3Enron}). A group of highly active and well connected email users were selected from the dataset in the following manner. Two periods were chosen, one for the stimuli and an overlapping, slightly longer period for the responses, according to the consideration mentioned above. After that, all the emails associated with these time intervals were searched for pairs of stimulus-reply according to the principle explained in section~\ref{sec:Chp5Operationalizing}.\footnote{The time interval between stimulus and response was set to a maximum of two weeks.} In most dyads, only a single  stimulus-response pair was identified, and the overall ratio of stimulus-response pairs found was rather low. To raise the ratio, dyads were retained only if they were associated with ten or more stimulus-response pairs of emails. The result was a network that contained one large component and several smaller disconnected components. Only members of the large component were retained. This process yielded a group of  $71$ individuals, within which $396$ (unordered) dyads engaged in at least one email transaction (at least one of the actors sent an email to the other). Of these non-directed dyads, $207$ are symmetric in the sense that they exchange emails in both directions. The other $189$ dyads are asymmetric (i.e., messages flow in one direction only). 

For a standard analysis of social networks, the description of nodes and related dyads would be sufficient for the construction of a social network model. But this model requires the description of the transactions, which play an important role in the explanatory model. The data set includes $2973$ email messages sent within a period spanning the months of September to December 2001. The number of recipients in the group of emails range from one to eighteen, a large minority of which are single-recipient emails. Each multi-recipient email can now be broken down into sender-recipient pairs. In other words, we treat an email sent to $J$ recipients as if $J$ copies of the email were sent, each copy to a single recipient. However, the common identifier of the email retains the affiliation between the different copies to the original email. %This is contrary to the established practice of analysing email communication networks as a set of disconnected dyadic transactions \citep{kossinets2006}

If we break down each email that was sent to $J$ recipients into $J$ `copies' of the message, we find 4194 `copies' in the data set. Each `copy' is taken as the micro-level case. The number of replies identified in the dataset is $540$, which means that the overall proportion of replies is is $12.9\%$. 

\figuremacroW{sender_recipient_reply_rate}{Sender/Recipient Rate of Reply}{Aggregate rate of reply among a subset of the actors in the data set. The vertical axis denotes senders of stimuli emails, the horizontal axis denotes the recipients. The darker colors denote a higher rate of reply from recipients to senders. Note that the matrix has a weak but noticeable tendency to symmetry}{.60}


In terms of the crossed factor multilevel model, our dataset consists of 4194 cases (micro-level entities) each representing a dyadic sender-recipient interaction. Each case is associated with the four crossed factors (macro-level entities): 71 senders, 71 recipients, 396 dyads and 2973 messages. Figure~\ref{sender_recipient_reply_rate} compares the rate of reply for ten of the actors in the network. Take for example actor 71  and actor 46. We see that when actor 71 is the recipient of emails coming from actor 46, the rate of reply is rather high. But when 46 needs to reply to 71, he does a poor job. The same actor can therefore play in different roles, the role of recipient and the role of sender. Overall the matrix is more or less symmetric, indicating that within every pair, if one actor has a high rate of reply to the other, the other responds in kind. 

\subsection{Modelling actor, dyad and stimulus effects}
The outcome variable $y_{ij}$ denotes the existence of a reply from the $i^{th}$ recipient of the $j^{th}$ email back to its sender ($1$ denotes a reply, $0$ no reply). %Fortunately, there is a rather plausible way to establish the existence of a reply. For every email in the dataset and each of its recipients we search for a subsequent email that is sent from that recipient back to the sender and bears a subject-line identical to that of the stimulus (ignoring prefixes like {\it `re:'} or {\it `fwd:'} or combinations thereof). %\footnote{See footnote in page \pageref{note:falseposneg} for possible issues regarding false-positives or false-negatives using this technique } %\footnotemark[\ref{note:falseposneg}]
%It is of course possible that email recipients change the subject when they reply (a false negative), or hit reply on an email because they cannot bother to search the address of the actor they would like to communicate with (a false positive)}. 
The outcome variable $y_{ij}$ is assumed to have a Bernoulli distribution:

\[
 y_{ij} \sim \hbox{Binomial }(1,\pi_{ij})
\]





We assume a $\logit$ link function from the probability $\pi_{ij}$, related to the predictors $X_{ij}^T$ specific to the outcome through a vector of fixed parameters $\beta$ and four random effects consisting of the effect of the stimulus email itself $u_j^{stim}$, its sender $u_{s[j]}^{sender}$, its recipient $u_{r[i,j]}^{recip}$ and the sender-recipient dyad $u_{d[r,s]}^{dyad}$ where $j$, $s$, $r$ and $d$ correspond to unique identifiers of the email, the sender, the recipient and the undirected dyad. The latter is subject to the constraint $d[r, s] = d[s, r]$ \footnote{The model was designed to be as consistent as possible with \citep{snijders1999}}. 

\[
 \logit(\pi_{ij}) = \logit(\frac{\pi_{ij}}{1-\pi_{ij}}) = X_{ij}^T\beta  + u_{s[j]}^{sender} + u_{r[i,j]}^{recip} + u_{d[r,s]}^{dyad} + u_j^{stim}
\]

To apply the multiple role model discussed above, the two residuals for any particular actor $k$ correspond to the the two roles any actor can play (the role of sender and recipient), since $u_k^{sender}$ and $u_k^{recip}$ are random effects of the same actor, and therefore they are assumed to have a joint normal distribution with the exact composition of the dispersion matrix $\Sigma$ of order 2 as presented below,


\[
\left[\begin{array}{c} u_k^{sender} \\ u_k^{recip}\end{array}\right]
 \sim N \left( 
\left[ {\begin{array}{cc}
  0  \\
  0  \\
 \end{array} } \right],\left[ {\begin{array}{cc}
 \sigma_s^2 & \rho\sigma_s\sigma_r  \\
 \rho\sigma_s\sigma_r & \sigma_r^2  \\
 \end{array} } \right] \right)\\
 \\
\]
Note the residuals for different actors are assumed to be {\it a priori} uncorrelated. The other effects are modelled analogously and are uncorrelated.
\[
u_{d}^{dyad}  \sim N (0, \sigma_{dyad}^2)\\
\\
u_{j}^{stim}  \sim N (0, \sigma_{stim}^2)\\
\]
For all models a uniform prior is assumed for the fixed effects and a flat prior  (with lower bound of zero and upper bound of 100.) The priors used in the multivariate normal model assume a multivariate normal distribution for the two residuals (corresponding to the two factors) with inverse covariance matrix $\Sigma^{-1}$ to which we assign the Wishart distribution with two degrees of freedom and the identity matrix for the scale matrix.

The first and second models estimate only the random effects, but the third model includes three fixed effects: first, we expect a greater probability for a reply from recipients addressed in the {\it to} field of an email than recipients addressed in the {\it cc} or {\it bcc} fields. A dummy variable denotes whether recipient $i$ is assigned to the {\it to} field of the email $j$. It is assigned with values $1$ if the recipient is in the {\it to} field, $0$ otherwise. This is a micro-level effect unique to the email-recipient pair. 


The second fixed effect is a count of the number of recipients in each email. A lower rate of  reply is expected for emails with numerous recipients (such as in the case of `bulk' emails.) Due to the very skewed distribution of this count (see section \ref{sec:Chp4Distributions}) it is binned, and the order of the bin was used as the predictor in the model. The last fixed effect is the total number of emails exchanged between the actors of each dyad, \textbf{\textit{prior}} to the email in question. This count is used as a proxy for the strength of the tie between sender and recipient at the moment the email was was sent. Stronger ties are characterized by frequent exchanges, and they are expected to yield a higher rate of reply from the actors involved according to \citet{granovetter1973}.






\section{Results}
The three models were estimated using Markov chain Monte Carlo sampling approach. All models were fitted using JAGS version 2.1.0 \citep{Jags2004} running two parallel chains, discarding the first 3000 replicates and basing inference on the next 30000 for each chain. The findings presented in table~\ref{tbl:Chp5Results} provide strong evidence that the four factors are important sources of variability in the effectiveness of emails to elicit replies, with a steady reduction of the Deviance Information Criterion (DIC), which is a goodness-of-fit index generalized from Akaike's Information Criterion \citep{spiegelhalter2002}

\begin{table} [htbp]
\begin{center}
 \begin{tabular}{l|ccc}
   & model 1 & model 2 & model 3 \\
  \hline \\
  Constant & -2.06 (0.140)&-2.55 (0.140)&	-2.02 (0.270) \\
  {\it To} Field & & & 0.52 (0.220) \\
  Number of email recipients (\textit{binned}) & & & -0.31 (0.030) \\
  Frequency of email exchange & & & 0.02 (0.003)\\
   \hline \\
  $\sigma_{sender}$  & 0.73 (0.11)   & 0.54 (0.15) & 0.47 (0.20) \\
  $\sigma_{recip}$  & 0.71 (0.09)    & 0.54 (0.14) & 0.50 (0.20) \\
  $\rho_{send,recip}$  & 0.38 (0.15) & 0.0 & 0.0 \\
  $\sigma_{dyad}$  & & 1.09 (0.14)   & 1.82 (0.31)\\
  $\sigma_{stim}$ & & & 2.62 (0.52) \\
  \hline \\
  Deviance & 2925 & 2833 & 2700 \\
 \end{tabular}
\end{center}
\caption{Crossed multilevel models: Results}
\label{tbl:Chp5Results}
\end{table}



The findings in the first model demonstrate a substantial variation between actors, as well as a  correlation between sender and recipient effects. This suggests that, at least in this specific dataset, actors who tend to reply to others also tend to elicit replies from others. The positive correlation between these two features can be seen in greater detail in figure~\ref{modl1sender_recip}. The figure also allows the identification of interesting or unusual actors. For example, actor 28 tends to have a strong sender effect (likely to get replies to her emails,) but only a modest recipient effect.

The second model suggests that adding the dyad effect reduces the variation explained by the sender and recipient effects. Also, adding the dyad factor necessitated the removal of the multiple-role model, and the correlation between sender and recipient random variables was set to zero.  

The third model demonstrates that both emails and dyads are important factors governing the variability of the rate of reply, and that these are more important sources of variability than the properties of the actors.  Furthermore, the fixed effects operate in the expected direction: (1) recipients in the {\it to} field are more likely to reply, (2) recipients of broadcast emails are less likely to reply and (3) The frequency of prior email exchanges does not explain the likelihood for reply.


\figuremacroWH{modl1sender_recip}{Comparing sender and recipient random effects.}{Each email user is both a potential sender and receiver, and is thus associated with a variation from the estimated average effect}{.60}

%\begin{figure}
%	\begin{center}
%	\mbox{
%		\includegraphics[width=3in]{modl1sender_recip.png}
%		}
%	\end{center}
%	\caption{Comparing sender and recipient random effects.}
%	\label{fig:SenderRecipientRE}
%\end{figure}


\section{Summary and Reflections}
% How do these results relate to the results in the last chapter
% How 
% The situation and the tie - teasing out the effects on the action of reciprocity. Here is somethign that we will later discuss as strategic vs. parametric constraints. 
% % Several important extensions: 1. If we get information about ppl, add homophily and at least the number of out/indegrees could be important here! 2. 



This chapter offers a general method to extract valuable information about latent properties of actors, ties and messages from email data sets. It allows the comparison within a level of abstraction (e.g., comparing actors as in figure~\ref{modl1sender_recip}), and between levels of abstraction (e.g., comparing the effect of ties, nodes and stimuli.) Substantially we see that ties are more important than nodes in explaining sources of variability. This suggests, rather than being a global property of individuals, replying to an email is rather like social capital, a resource that `inheres in the structure of relations between actors' \citep{coleman1988}. In other words, at least within this dataset, actor level effects are less important than dyad effects. This means that variations between actors are less important than variation between different ties of each actor. Put crudely, if you want to elicit a reply from your recipient, it matters less who you are or who your communication partner is, what matters is your relation to each other.

Conceptually, this method is innovative on two accounts: first, it considers not only how one actor is connected to another in a network, but also how one transaction triggers another in a sequence \citep{gibson2005,butts2008}. Second, it suggests a way to develop models of co-evolution processes that operate not only at the meso-level (nodes and dyads) \citep{Snijders2007}, but also between the meso-level and the micro-level (transactions). 

There are several insights gained from this study. First, it is interesting to note that sender and recipient variability are roughly the same. This means that throughout the three models, the variability in the outcome is shared fairly between the sender and recipient. We know from the literature that other types of social actions depend much more on the focal actor than on the target of the action (see for example \citep{snijders1999}). Furthermore, it is likely that the positive correlation between sender and recipient effect is highly contingent on the context. In some contexts the correlation may be negative, if, for example, individuals occupying central roles in the organization have a high sender effect (the recipients of their emails tend to reply) but a low recipient effect (they are not necessarily responsive). 


 


%Finally, different reply probabilities to the same message have much more in common than different replies to different messages, even if those stem from the same sender and addressed to the same recipients. In other words, multi-recipient messages provide the context that affects variations across multiple dyads.

The method developed in this chapter can highlight various aspects about contingent to an organization. It can estimate variations in positions of actors and their relations. But most importantly it assesses the role and the consequences of social transactions. Multi-recipient emails are treated as a common background affecting the decision making of different recipients.

Probing the relationship between social transactions (e.g., a message, a rose or a wink \citep{oakeshott1991}) and social relationships (e.g., kinship, friendship or contractual relations) is an opportunity to highlight the differences between these concepts and develop conceptual tools to address their mutual influences. While remaining agnostic about the precise nature of ties, we can still argue that ties are not defined merely by their associated transactions, and that ties and transactions can be apprehended independently. 

Arguably, this distinction is also at the very heart of the departure of the emerging field of `computational social science' from traditional studies of social networks. Whereas the former focuses on networks of transactions, interaction and communication \citep{monge2003}, the latter focuses on social ties and the social capital that `inheres' in their structure \citep{coleman1988}. By combining dyads, actors and transaction into the same model, this chapter argues for the potential in bridging these two agendas 




% ----------------------- end of thesis sub-document ------------------------
% ---------------------------------------------------------------------------